Goto

Collaborating Authors

 trust-region method



Trust-Region Methods with Low-Fidelity Objective Models

Angino, Andrea, Aurina, Matteo, Kopaničáková, Alena, Voigt, Matthias, Donatelli, Marco, Krause, Rolf

arXiv.org Machine Learning

We introduce two multifidelity trust-region methods based on the Magical Trust Region (MTR) framework. MTR augments the classical trust-region step with a secondary, informative direction. In our approaches, the secondary ``magical'' directions are determined by solving coarse trust-region subproblems based on low-fidelity objective models. The first proposed method, Sketched Trust-Region (STR), constructs this secondary direction using a sketched matrix to reduce the dimensionality of the trust-region subproblem. The second method, SVD Trust-Region (SVDTR), defines the magical direction via a truncated singular value decomposition of the dataset, capturing the leading directions of variability. Several numerical examples illustrate the potential gain in efficiency.



A Trust-Region Method for Graphical Stein Variational Inference

Pavlovic, Liam, Rosen, David M.

arXiv.org Machine Learning

Stein variational inference (SVI) is a sample-based approximate Bayesian inference technique that generates a sample set by jointly optimizing the samples' locations to minimize an information-theoretic measure of discrepancy with the target probability distribution. SVI thus provides a fast and significantly more sample-efficient approach to Bayesian inference than traditional (random-sampling-based) alternatives. However, the optimization techniques employed in existing SVI methods struggle to address problems in which the target distribution is high-dimensional, poorly-conditioned, or non-convex, which severely limits the range of their practical applicability. In this paper, we propose a novel trust-region optimization approach for SVI that successfully addresses each of these challenges. Our method builds upon prior work in SVI by leveraging conditional independences in the target distribution (to achieve high-dimensional scaling) and second-order information (to address poor conditioning), while additionally providing an effective adaptive step control procedure, which is essential for ensuring convergence on challenging non-convex optimization problems. Experimental results show our method achieves superior numerical performance, both in convergence rate and sample accuracy, and scales better in high-dimensional distributions, than previous SVI techniques.


A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Diouane, Youssef, Habiboullah, Mohamed Laghdaf, Orban, Dominique

arXiv.org Artificial Intelligence

We develop R2N, a modified quasi-Newton method for minimizing the sum of a $\mathcal{C}^1$ function $f$ and a lower semi-continuous prox-bounded $h$. Both $f$ and $h$ may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of $f$, a model of $h$, and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when $h$ is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of $\nabla f$, while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of $\nabla f$, we establish a tight worst-case complexity bound of $O(1 / \epsilon^{2/(1 - p)})$ to bring said measure below $\epsilon > 0$, where $0 \leq p < 1$ controls the growth of model Hessians. The latter must not diverge faster than $|\mathcal{S}_k|^p$, where $\mathcal{S}_k$ is the set of successful iterations up to iteration $k$. When $p = 1$, we establish the tight exponential complexity bound $O(\exp(c \epsilon^{-2}))$ where $c > 0$ is a constant. We describe our Julia implementation and report numerical experience on a basis-pursuit problem, image denoising, minimum-rank matrix completion, and a nonlinear support vector machine. In particular, the minimum-rank problem cannot be solved directly at this time by a TR approach as corresponding proximal operators are not known analytically.


TRBoost: A Generic Gradient Boosting Machine based on Trust-region Method

Luo, Jiaqi, Wei, Zihao, Man, Junkai, Xu, Shixin

arXiv.org Artificial Intelligence

Gradient Boosting Machines (GBMs) have demonstrated remarkable success in solving diverse problems by utilizing Taylor expansions in functional space. However, achieving a balance between performance and generality has posed a challenge for GBMs. In particular, gradient descent-based GBMs employ the first-order Taylor expansion to ensure applicability to all loss functions, while Newton's method-based GBMs use positive Hessian information to achieve superior performance at the expense of generality. To address this issue, this study proposes a new generic Gradient Boosting Machine called Trust-region Boosting (TRBoost). In each iteration, TRBoost uses a constrained quadratic model to approximate the objective and applies the Trust-region algorithm to solve it and obtain a new learner. Unlike Newton's method-based GBMs, TRBoost does not require the Hessian to be positive definite, thereby allowing it to be applied to arbitrary loss functions while still maintaining competitive performance similar to second-order algorithms. The convergence analysis and numerical experiments conducted in this study confirm that TRBoost is as general as first-order GBMs and yields competitive results compared to second-order GBMs. Overall, TRBoost is a promising approach that balances performance and generality, making it a valuable addition to the toolkit of machine learning practitioners.


Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems

Fang, Yuchen, Na, Sen, Mahoney, Michael W., Kolar, Mladen

arXiv.org Machine Learning

We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where in each iteration a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to employ indefinite Hessian matrices (i.e., Hessians without modification) in SQP subproblems. As a trust-region method for constrained optimization, our algorithm needs to address an infeasibility issue -- the linearized equality constraints and trust-region constraints might lead to infeasible SQP subproblems. In this regard, we propose an \textit{adaptive relaxation technique} to compute the trial step that consists of a normal step and a tangential step. To control the lengths of the two steps, we adaptively decompose the trust-region radius into two segments based on the proportions of the feasibility and optimality residuals to the full KKT residual. The normal step has a closed form, while the tangential step is solved from a trust-region subproblem, to which a solution ensuring the Cauchy reduction is sufficient for our study. We establish the global almost sure convergence guarantee for TR-StoSQP, and illustrate its empirical performance on both a subset of problems in the CUTEst test set and constrained logistic regression problems using data from the LIBSVM collection.


Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

Erway, Jennifer B., Griffin, Joshua, Marcia, Roummel F., Omheni, Riadh

arXiv.org Machine Learning

Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but suffer from approximating a potentially indefinite Hessian with a positive-definite matrix. Hessian-free methods leverage the ability to perform Hessian-vector multiplication without needing the entire Hessian matrix, but each iteration's complexity is significantly greater than quasi-Newton methods. In this paper we propose an alternative approach for solving ML problems based on a quasi-Newton trust-region framework for solving large-scale optimization problems that allow for indefinite Hessian approximations. Numerical experiments on a standard testing data set show that with a fixed computational time budget, the proposed methods achieve better results than the traditional limited-memory BFGS and the Hessian-free methods.


A Distributed Second-Order Algorithm You Can Trust

Dünner, Celestine, Lucchi, Aurelien, Gargiani, Matilde, Bian, An, Hofmann, Thomas, Jaggi, Martin

arXiv.org Machine Learning

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the computation and the communication of a large Hessian matrix. In this paper we present a new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers. To deal with this approximate information we propose an adaptive approach that - akin to trust-region methods - dynamically adapts the auxiliary model to compensate for modeling errors. We provide theoretical rates of convergence for a wide class of problems including L1-regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.


A trust-region method for stochastic variational inference with applications to streaming data

Theis, Lucas, Hoffman, Matthew D.

arXiv.org Machine Learning

Stochastic variational inference allows for fast posterior inference in complex Bayesian models. However, the algorithm is prone to local optima which can make the quality of the posterior approximation sensitive to the choice of hyperparameters and initialization. We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update. We show that this leads to generally better results and reduced sensitivity to hyperparameters. We also describe a new strategy for variational inference on streaming data and show that here our trust-region method is crucial for getting good performance.